Efficient Skyline Computation over Low-Cardinality Domains
نویسندگان
چکیده
Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution (i.e. whether the dataset attributes are correlated, independent, or anti-correlated). In this paper, we propose the Lattice Skyline Algorithm (LS) that is built around a new paradigm for skyline evaluation on datasets with attributes that are drawn from low-cardinality domains. LS continues to apply even if one attribute has high cardinality. Many skyline applications naturally have such data characteristics, and previous skyline methods have not exploited this property. We show that for typical dimensionalities, the complexity of LS is linear in the number of input tuples. Furthermore, we show that the performance of LS is independent of the input data distribution. Finally, we demonstrate through extensive experimentation on both real and synthetic datasets that LS can result in a significant performance advantage over existing techniques.
منابع مشابه
Skyline Evaluation Within Join Operation, Block Nested Loop Join Implementation
Skyline Join approach in its Naïve age work as it computes join first and then apply skyline computation to find corresponding skyline objects. Considering increase in cardinality and dimensionality of join table the cost of computing skyline in a non-reductive join relation is costlier than that of on single table. Most of the existing work on skyline queries for databases mainly discusses the...
متن کاملEfficient Skycube Computation Using Bitmaps Derived from Indexes
TAMBARAM KAILASAM, GAYATHRI. Efficient Skycube Computation using Bitmaps derived from Indexes. (Under the direction of Dr. Jaewoo Kang.) Skyline queries have been increasingly used in multi-criteria decision making and data mining applications. They retrieve a set of interesting points from a potentially large set of data points. A point is said to be interesting if it is as good or better in a...
متن کاملDiscovering Representative Skyline Points over Distributed Data
Skyline queries help users make intelligent decisions over complex data. The main shortcoming of skyline queries is that the cardinality of the result set is not known a-priori. To overcome this limitation, the representative skyline query has been proposed, which retrieves a fixed set of k skyline points that best describe all skyline points. Even though the representative skyline has been stu...
متن کاملGetting Prime Cuts from Skylines over Partially Ordered Domains
Skyline queries have recently received a lot of attention due to their intuitive query formulation: users can state preferences with respect to several attributes. Unlike numerical preferences, preferences over discrete value domains do not show an inherent total order, but have to rely on partial orders as stated by the user. In such orders typically many object values are incomparable, increa...
متن کاملEfficient Algorithms for Similarity and Skyline Summary on Multidimensional Datasets
Efficient management of large multidimensional datasets has attracted much attention in the database research community. Such large multidimensional datasets are common and efficient algorithms are needed for analyzing these data sets for a variety of applications. In this thesis, we focus our study on two very common classes of analysis: similarity and skyline summarization. We first focus on ...
متن کامل